home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-11-02 | 4.4 KB | 131 lines | [ttro/ttxt] |
-
- _________________________________________________
-
- Tokenize Scripting Addition ver. 1.1
-
- Copyright (C) 1994 Wayne Walrath
-
- _________________________________________________
-
- This software is free for personal use. To obtain a cheap and simple license for
- corporate, commercial or institutional use, contact the author at one of the
- addresses listed at the end of this document. THIS SOFTWARE IS PROVIDED AS IS
- WITHOUT WARRANTIES. USE AT YOUR OWN RISK! You are encouraged to share this
- software with other people and to upload it to online services, but you may not
- charge money for it and you should only transfer the complete package. Contact
- me if you doubt whether you have a complete package. Inclusion on CD-ROMs
- requires explicit permission from me (the author).
-
-
- The demo AppleScript included with the distribution contains many examples of
- Tokenize's usage.
-
- Tokenize was designed to make it easier to split text into elements based on a
- set of delimiters. The demo AppleScript illustrates several novel uses for
- Tokenize which may not be obvious at first glance.
-
-
- INSTALLATION:
- ______________________
- To install: Drag Tokenize to the Scripting Additions folder inside the
- Extensions folder.
-
-
-
- BACKGROUND INFORMATION
- ______________________
-
- Because of the way the tokenization is implemented, Tokenize can also be used
- as a quick way of removing unwanted characters from a text string. To better
- understand what is possible with Tokenize, here's a brief description of how
- Tokenize functions. The text to be tokenized is scanned for each of the strings
- given in the delimiter list, and all occurrences of these strings are replaced
- by a special character (essentially a null-char). After all delimiters are
- processed, a final pass is made which gathers all the strings between the
- special characters into a list. Understanding this algorithm will help you to
- figuring out how text will ultimately be parsed when using Tokenize.
-
-
- For example, consider an arbitrary string of text which contains words
- separated by tab characters, and between each word there will be one to three
- tabs. Here's a string set up as described:
-
- set testString to "One\tGiant\t\tStep\tFor\t\t\tMankind"
-
-
- If I tokenize this string using tab as the only delimiter, It returns this
- list:
-
- tokenize testString with delimiters tab
- => {"One", "Giant", "Step", "For", "Mankind"}
-
-
- If, on the other hand, I tokenize using a string of three tabs, the output is
- different:
-
- tokenize testString with delimiters tab & tab & tab
- => {"One Giant Step For", "Mankind"}
-
-
- The output from this version consists of a list of two strings. Since tokenize
- only found one place in the testString where there were three tab characters
- side by side it split the string there. Tokenizing with a two tab string would
- produce yet a different result.
-
-
-
- USAGE:
- ______________________
-
- tokenize <a String> with delimiters { [<sep. string 1>] [,<sep. string2]...}
-
- the direct parameter to tokenize is a string, and the second (required)
- parameter is a list of strings (one or more bytes in length) to use in
- tokenizing the direct parameter.
-
- If you are only tokenizing with one delimiter you need not pass it as a list
- since AppleScript will handle the coercion for you. For example, the following
- is legal:
-
- tokenize "My Name Is" with delimiters " Name "
- => {"My","Is"}
-
-
- Some text processing tasks require more than one call to Tokenize to perform.
- As an example, if the variable myText contained a number of lines separated by
- return characters, and you wanted to retrieve the words from line five, you
- could write the following AppleScript commands:
-
- tokenize myText with delimiters {return} tokenize (item 5 of result) with
- delimiters {space}
- => [result is a list with all the words from line five of the text]
-
-
- ______________________
- Comments, bug reports and suggestions are welcomed. If you have any ideas for
- useful Scripting Additions which haven't been written yet, send me a message
- describing your idea.
-
-
- VERSION HISTORY:
- ______________________
- VER 1.1 - 2Nov94
- Fixed bug which surfaced in ver 1.0 when the tokens were longer than 255 chars.
- This bug resulted in random memory being stomped on when the token was too long.
- All users of version 1.0 should switch immediately.
-
- Cleaned the code up a bit and optimized it a bit.
-
- VER 1.0 - Oct94
- First release.
-
-
- ___________________________
- Wayne Walrath
- 2010 Ravenswood Dr.
- Evansville, IN 47714
- (812) 476-8610
- walrath@cs.indiana.edu
- CIS: 70233,3151
- ___________________________
-